[flang][runtime] Speed up initialization & destruction #148087

klausler · 2025-07-10T23:51:04Z

Rework derived type initialization in the runtime to just initialize the first element of any array, and then memcpy it to the others, rather than exercising the per-component paths for each element.

Reword derived type destruction in the runtime to detect and exploit a fast path for allocatable components whose types themselves don't need nested destruction.

Small tweaks were made in hot paths exposed by profiling in descriptor operations and derived type assignment.

vzakhari

Looks good to me. Thank you, Peter!

vzakhari · 2025-07-11T23:44:13Z

flang-rt/include/flang-rt/runtime/descriptor.h

Suggested change

// address all elements. It genernalizes contiguity by also allowing

// address all elements. It generalizes contiguity by also allowing

Thanks. I no longer always see this sort of thing.

vzakhari · 2025-07-12T00:03:47Z

flang-rt/include/flang-rt/runtime/descriptor.h

Should we mark such methods with force-inline attribute. I am not suggesting doing it in this PR, but I wonder if you tried it.

I didn't try it, and I've run into other trouble because something somewhere has a link-time unresolved external reference now to Descriptor::Elements(). Updating soon to leave Elements out-of-line with a new inline InlineElements() that it can call.

I have no clue how this change could have caused a linking issue..

I couldn't track it down either. The unresolved reference is not within flang_rt.runtime or the generated code for the failing test.

vzakhari · 2025-07-12T00:05:59Z

flang-rt/lib/runtime/derived.cpp

As the chunk size grows, I guess, we can hit cache conflicts depending on the aliasing implementation. Do you know if there is any non-temporal memcpy implementation that we can try?

Maybe I should use

char *to{rawInstance + *stride}; char *from{rawInstance}; for (std::size_t bytes{...}; bytes--; ) { *to++ = *from++; }

Not sure about that. It will probably be the same as memcpy if the compiler vectorizes it. I think memcpy may do better with rep movs on x86. Let's keep it simple with memcpy. If the cache issue pops up anywhere I will experiment with https://clang.llvm.org/docs/LanguageExtensions.html#non-temporal-load-store-builtins

memcpy is not defined for overlapping regions, unfortunately.

Rework derived type initialization in the runtime to just initialize the first element of any array, and then memcpy it to the others, rather than exercising the per-component paths for each element. Reword derived type destruction in the runtime to detect and exploit a fast path for allocatable components whose types themselves don't need nested destruction. Small tweaks were made in hot paths exposed by profiling in descriptor operations and derived type assignment.

klausler requested a review from vzakhari July 10, 2025 23:51

klausler force-pushed the wq-speedup branch 2 times, most recently from 7597a02 to b9be690 Compare July 11, 2025 23:53

vzakhari approved these changes Jul 12, 2025

View reviewed changes

klausler force-pushed the wq-speedup branch from b9be690 to 5fe057c Compare July 12, 2025 00:23

klausler merged commit 2e53a68 into llvm:main Jul 14, 2025
9 checks passed

klausler deleted the wq-speedup branch July 14, 2025 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang][runtime] Speed up initialization & destruction #148087

[flang][runtime] Speed up initialization & destruction #148087

Uh oh!

klausler commented Jul 10, 2025

Uh oh!

vzakhari left a comment

Uh oh!

vzakhari Jul 11, 2025

Uh oh!

klausler Jul 12, 2025

Uh oh!

vzakhari Jul 12, 2025

Uh oh!

klausler Jul 12, 2025

Uh oh!

vzakhari Jul 12, 2025

Uh oh!

klausler Jul 12, 2025

Uh oh!

vzakhari Jul 12, 2025

Uh oh!

klausler Jul 12, 2025

Uh oh!

vzakhari Jul 12, 2025

Uh oh!

klausler Jul 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// address all elements. It genernalizes contiguity by also allowing
	// address all elements. It generalizes contiguity by also allowing

[flang][runtime] Speed up initialization & destruction #148087

[flang][runtime] Speed up initialization & destruction #148087

Uh oh!

Conversation

klausler commented Jul 10, 2025

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants